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A CONCEPT OF INDEPENDENCE 
WITH APPLICATIONS IN VARIOUS FIELDS OF MATHEMATICS 

by Leonid A. Levin 

ABSTRACT 

We use Kolmogorov's algorithmic approach to information theory to define a concept of inde- 
pendence of sequences, or equivalently, the boundedncss of their mutual information. This concept 
is applied to probability theory, intuitionistic logic, and the theory of algorithms. For each case, we 
study the advantage of accepting the postulate that the objects studied by the theory arc independent 
of any sequence determined by a mathematical property. 
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0. PRELIMINARIES 

0.1 INTRODUCTION 

The following attempt to define precisely the concept of independence may seem frivolous, for 
there are probably as many different concepts of "independence" in science as there are concepts of 
"freedom" in the humanities. To justify our efforts, we try to demonstrate that this definition can 
be applied in various fields of mathematics. 

The main idea of this work is to formalize, justify and apply the following physical postulate: 
If a is a sequence generated by a process of the physical world, and /3 is a sequence determined by a 
property formulated with no reference to events of the physical world, then a and /3 are independent. 
This agrees with Church's Thesis, which asserts the recursiveness of any sequence which is both 
mathematically defined and physically realizable, since a recursive sequence is by our definition 
independent even of itself. Our Independence Postulate expresses "autonomy of the physical world", 
its independence of anything outside itself. This corresponds to the idea of causality in physics. 

The sequence of papers in a mathematical journal, or the sequence of oil prices, are examples 
of a. The sequence of all true assertions in number theory is an example of /?. Below we take as 
examples of /? sequences defined by mathematical properties. In Chapters two through four the 
random sequences, the free-choice sequences of intuitionistic theories, and the representatives of 
"regular" Turing degrees respectively are considered as a. In each of these three cases we show that 
accepting the Independence Postulate allows us to radically simplify the corresponding theories. 

The theory is developed in the simplest version which is sufficient for the applications considered 
below. Proofs are presented rather formally and may be omitted at the first reading. 

0.2 AN EXAMPLE 

Recursive function theory allows us to construct analogues of many concepts of classical analysis 
by presuming recursive enumerability of the sets considered. The analogy obtained is quite good 
because intrinsically non-algorithmic methods are the exception rather than the rule in classical 
mathematics. Moreover, the general theory of algorithms is very similar to descriptive set theory. 
(This explains why the main attention of "constructive analysis" has been directed toward the 
search for exotic counter-examples to some theorems of classical analysis. These investigations have 
always remained of narrow special interest.) However, there is an important distinction between the 
constructive and classical theories: the existence of a universal algorithm. The set of r.e. sets is r.e. 
while the set of countable sets is uncountable. As was discovered in [S64, K65], this rather abstract 
difference implies "more concrete" differences and opens new analytical possibilities which have no 
analogies in "non-algorithmic" analysis. Let us explain this with a simple but important example. 

The space l\ of all absolutely convergent number series p, {p&i, iff p£R <&ICIp( z )I < °°) l8 
well studied in analysis. Its recursive analogue ~l\CHi consists of r.e. elements of li (i.e. of series 
p whose subgraph {(r, x): p(i)>rGQ} is r.e). It is known in calculus that /j, has no maximal to 
within a constant factor clement: VpG'i3gE/i \'\m(q(x)/p(x)) = oo. In contrast to this, li contains 
an "absorbing" element m such that Vq&i sup(<?(:r)/m(:r)) < oo. This will follow from Theorem 1. 
This fact is closely connected to the discovery by R.J. Solomonoff and A.N. Kolmogorov of optimal 
coding of finite objects which originated a new approach to information theory, the foundations of 
probability theory, inductive inference and a number of other fields. 

All of these results, which we combine under the name "algorithmic information theory", are 
based on purely analytical features which distinguish the recursive analogues of some spaces of 
analysis from their classical prototypes. The preceding example illustrates this distinction. 



0.3 NOTATION AND ASSUMPTIONS 

There are several natural general contexts for the formulation of this work. They differ in the 
space f2 of objects considered to be carriers of information. In all cases we need a countable family 
of functions on 0, which extract parts of this information. Declaring these functions continuous and 
identifying any two objects on which the values of all these functions coincide, we may regard Q as 
a topological space with a countable basis and with the Kolmogorov property: for any two points an 
open set exist containing only one of them (assuming this property for the functions' range). Among 
such spaces there is a universal one, i.e., a space containing the homcomorphic image of any other. 
Formulating the theory for this space is the most general possibility. If the range of the family of 
functions is a metric space, the Kolmogorov property of is strengthened to complete regularity. 
Among completely regular spaces with a countable base, there is also a universal one, R . As usual, 
considerations look much simpler for a regular space. We even introduce a further, unessential 
simplification, namely, that Cl is totally disconnected (i.e., any two points can be distinguished by 
a continuous mapping into a discrete space). Cantor's perfect set N (or {0, 1} ) which we do in 
fact consider, is universal among such spaces. Moreover, what we discuss in most detail is an even 
more special case: the space IN of non-negative integers. 

We compactify N to I by adding the symbol "oo". The number m-\-((m-{-n)(m~\-n+ 1) /2)ia 
called the pair (m,n) of the numbers m, nGlN. This enumeration of the pairs is bijective on N 2 CN 2 . 
The projections iti and 7r 2 are the functions on IN such that n = (7Ti(n), 7r 2 (n)). Henceforth, fi denote 
Cantor's perfect set, represented in the form of M . This form is more convenient than {0, 1} since 
the pairs (a,/?) and the projections z\ and *2 are simpler defined on it: (a, /?)(i) = (a(t'), /?(i)), where 
a(t),/3(i)£lN are the i-th terms of a and (3. 

Let S k — {0, 1, ...k, oo} fc and 5 = \jS k . A is the empty sequence, S = {A}; x'is the number of 
xGS in a natural effective enumeration of S. l(x) is the length of xES, i.e. the number k such that 
xE.Sk- If <*& or aES n and k<n, then a k GS k is the initial segment of a of length k in which all the 
terms larger than k are replaced by oo. xdy means l{x)<l(y) and x = yi( x ), likewise for xCZa. F x 
is the set of a£fi, such that x(Za. The sets T x form a countable basis consisting of the clopen (i.e. 
closed and open) subsets of Q. It is easy to see that if r r f\r y ^9 then r x (ZT y or T y cr x (i.e. yCZx or 
iCy). B is the set of finite binary sequences; Q, Q+, R, R+, Q+, R + , R, Q are the sets of rational, 
nonnegative rational, real numbers and so on respectively, [xj is the integer part of x. 

While considering topological spaces with natural countable bases we call an open set recursively 
enumerable (r.e.) if it equals the union of an r.e. family of basis sets. The function F with values 
in R we call r.e. if its subgraph, i.e., the set {(x, r): r<.F[x)} is r.e.. We call F recursive if F and 
-F are r.e.. We shall systematically assert the recursive enumerability of sets without giving the 
formal tedious constructions. Similarly, in Chapter 3, we assert the exprcssibility or provability of 
predicates of formal arithmetic without writing out the corresponding lengthy formulas or proofs. 
These assertions can be checked routinely. 

The symbols -<, >- and ~ denote inequality and equality to within an additive constant; ^<, 
>: and ~ denote these relations to within a constant factor; <^, ^ and «d denote asymptotic 
relations, (i.e. / <^g <=> Ve>03<z(/>a=3 g>{l-t)f))- Such expressions as £I/> su P/> min/, etc. denote 
the corresponding operations, taken over the values of all free variables of the term /. 



1. ALGORITHMIC INFORMATION THEORY 

1.1 UNIVERSAL SEMIMEASURE. 

Let o(x) = {y: xdy,l{y)=l{x)-\-l}. Then T x = \jT y , where yEo{x). Each finite positive Borel 
measure on Q is uniquely determined by a function /i:5-+R + such that Vz£^(y) = u[x), where 
yEo(x). We identify this function (giving the measure of the set T x ) with the measure itself. Let us 
introduce a somewhat more general concept. 

Definition 1: A semimeasure on Cl is a function ^:S->R+ such that Vx£j/(y)</i(z) (y£o[x)). 
Unless otherwise stipulated, we assume that semimeasures are normalized, that is, /i(A)=l. 

A semimeasure [A corresponds to the measure \i on Q\JS, where for xES, u'({x}) = u(x) — Ij^y), 
where y£cr(x). Anyr.e. measure /x is also recursive, because /z(x) = 1 — £>t(y), where l[y)=l{x), yf^x. 

Theorem 1: 

There exists a largest to within a constant factor r.e. semimeasure: 3MVu3c\/xu(x)<.cM(x). 

Proof: We prove, first, that the set of all r.e. semimeasures is r.e. For each finite set /4CSXQ~*~ 
a minimal semimeasure u A exists the closure of whose subgraph contains A. The norm of this fi A 
(i.e. ^a(A)) is evidently computable on a finite A. We denote it by \A\. Let f(i, n) be a total recursive 
function such that for each i the set of values of / is the i-th r.e. subset of Sx Q + containing (A, 1). 
Let/4(i, n) be the set of values of /on the pairs (i,m), where m < n. Let f(i, n)=f{i, n) if \A(i, n)|<l, 
otherwise f(i, n) — (A, 1). Let A(i) be the set of values of f on the pairs (i, n). Obviously u A (j.) = 
AM(i,oo) iff |A(i, °o)|<l. Thus, the family n A (i) enumerates all the normalized r.e. semimeasures (and 
only them). The semimeasure OlA' 2 K"i(i) xs obviously re - an ^ finite, and exceeds any other such 
semimeasure to within a constant factor 1/t 2 . Q.E.D. 

This semimeasure is called universal and denoted as M. It is the central technical concept of 
this work. Being the largest (to within a constant factor) among all r.e. semimeasures, it determines 
the broadest class of sets A(ZXl of positive measure. 

In mathematical statistics one tries, given a, to get a probability distribution p, for which it 
would be reasonable to assert that "a is random with respect to fj," . This usually means that some 
properties of a (i.e. sets A&l containing a), are of positive probability with respect to p. But the 
latter assertion is the weakest in the case n — M. So, we can take M a priori , before studying what 
the properties of a really are. For this reason we call M the a priori probability distribution. 

Let us express this in other words. Suppose we want to predict the properties of some unknown 
sequence a. The assumption that a occurs randomly with probability distribution fj, allows us to 
conclude that a will have a property A, when the probability u(M) of the opposite property equals 0. 
The class of such properties is the narrowest in the case where u — M and therefore these properties 
can be presumed before clearing up what fi really is. Therefore, before determining the nature of a 
random process, one may assume a priori such properties of an outcome which are certain to hold 
for the random process with distribution M. This justifies calling M the a priori probability. 

M has all the properties necessary for the construction of an inductive inference theory in 
accordance with the ideas of R.J. Solomonoff [S64], but we cannot go into this question here. In 
further accounts we will consider M as the a priori probability according to the use of this concept 
in statistics. Let us note that if /i is an r.e. semimeasure, then with probability 1 (by u) a sequence 
a is such that values u(a n ) and M(a n ) agree to within a factor independent of n. This property of 
q can be used as a definition of the concept of "a sequence random with respect to the probability 
distribution u" . We do not attempt to explore fully the properties of M as the a priori probability; 
our main application of M is to algorithmic information theory. 



1.2 DISCRETE CASE: COMPLEXITY AND INFORMATION 

Before introducing our concepts for the Cantor set Q let us consider the simpler space N. Let 
m be obtained by applying M to N, namely, define m(k) = M{a:a[0)=k}. The definitions of M and 
m imply trivially that mEh and is, in fact, the largest element in ^ to within a constant factor. 

The complexity of n£N is K{n) = — [log2w(n)J. This function, as it turns out, defines the 
length of the shortest code for n, using an optimal self-delimiting coding: 

An algorithm A:B—>H is called self-delimiting if A(pi)=A{p2) for any pi,p2 such that P2CP1 
and A(pi) and A{p 2 ) are defined. This means that if A has produced a result on some input, it 
cannot produce anything different on any continuation of this input. Informally, the algorithm A 
recognizes the end of the "essential part of the input" and pays no attention to further symbols. 
Such an algorithm needs no special symbol, to distinguish the end of input, and its input alphabet 
is consequently "authentically binary". 

Proposition 2 (about coding): There exists a self- delimiting algorithm A capable of generating 
any nEJJ from an input of length K(n); more precisely 3AVn3p(A(p)—n and l(p)=K(n)-j-i). 

No self- delimiting algorithm A' can be better by more than an additive constant i.e. 
VA'3cVn\/p{A'{p)=n) =» l{p)>K{n)-c. 

Therefore, the value K(n) defines, to an additive constant, the minimal amount of information 
necessary to determine n. This corresponds to Shannon':- idea that the amount of information in an 
event equals the negative logarithm of its probability. K(n) is the full amount of information in n. 

Proof of Proposition 2: The proof is almost obvious. We need to show for an arbitrary function 
K':N— >N that 2~ K &\ (see 0.3), iff a constant C and a self-delimiting algorithm A exist such that 
K'{x)-\-C = min {l(q): A(q)—x}. Being self-delimiting, A cannot take different values on segments 
of the same sequence, and can be considered as a function from ^2- The natural measure B 2 on Q% 
isB 2 {r q ) = 2-'(«>. If K\x)+C = min l(q):A{q)=x then, obviously, 2-< K 'W+ c "><B 2 {a\A(a)=z}. 
Then £2-( K "( I )+ c )<l and 2~ K '&i. Also 2~ K ' is r.e. because the graph of A is. 

Vice versa, if 2~ K Eli then the set A = {[x, n): n>K'(x)} is r.e. Let n(x, n)=2 _n , if n>K'{x), 
and n{x, n)=0 otherwise. Then C = 2+[ log ilX*. n)J< 2+Jog 2 E 2 ~ K {x) < °°- lt is eas y to find 
a recursive bijection (x, n)—*q x<n of A to a self-delimiting set A'dB such that B2{q x ,n) = 2~( n ~i~ >, 
and thus l{q x , n ) == n-\-C. The desired algorithm A maps q x>n to x, Q.E.D. 

The code of a pair (n, m) of numbers can be shorter than K(n)-|-K(m) because n and m may 
contain mutual information coded only once. 

Definition 2: The value I(n:m) = K(n) -f- K(m) — K{n, m) is called the amount of mutual 
information in n and m. 

Remark: The self-delimitedness of the coding algorithms A implies that I(n:m)>-0, since the 
pair (n, m) can be encoded by pip-i, where p\ and pi are the shortest codes for n and m respectively. 



1.3 DISCRETE CASE: RANDOMNESS AND INDEPENDENCE 

In order to arrive at the expression I(x:y) from another point of view, let us consider some 
problems connected with the concept of randomness. In 1.1 we mentioned the possibility of charac- 
terizing the properties of sequences occurring randomly with a probability distribution u by the 
boundedness of the ratio of M to u on their segments. Now we touch upon this matter in the simplest 
case: x£H. While considering a probability distribution on a countable set, we usually cannot talk 
about "properties random objects must have" since usually only the whole space is of probability 
1. Then we have to consider quantities which must be small on random objects. (This means that 
a given "test" takes large values only with a small probability). 

Let u be a recursive measure on IK, u:U— ►R+, J2 U ( X ) = ^- Let us call a randomness test 
with respect to a or a /i-test any r.e. function <5:N— ►IN, which satisfies the Martin-Lof condition: 
Vnlog2/i{£: £(x)>n} < — n. Let m be the universal measure on N, defined in 1.2. 

Proposition 3: 

a) For any recursive measure u the function d(x/u) = [ log 2[m(x)/u(x))\ is a u-test. 

b) For any recursive measure u and u-test 6, 6(x) ^d(x/u). 

c) m is maximal to within a constant factor among all functions for which a) holds. 

Proposition 3 indicates that d(x/u) is, in a sense, a universal characteristic of "non-randomness" 
and we call it the randomness deficiency of x with respect to u. Motivations of the concept of 
randomness are discussed in Chapter 2. 

Proof: a) Obviously d is r.e. Let u{x: log 2 {m{x)/u[x))>n} > 2 _n . Then u{x:m[x)>2 n u[x)} > 
2 _n . Then m{x:m(x)>2 n u(x)} > 2 n u{x:m(x)>2 n u(x)} > 1, which contradicts the normality of m. 

b) Let u'(x)=:u(x)2 s W/6 2 (x). Then, 2>'(:iO<£ n= 5 (l )/*(z)27" 2 =£(2VnV{*:%)="} < 
Yi{2 n /n 2 )2~ n = X]l/n 2 < oo. Thus u'(x) is the r.e. semimeasure. Then u'{x)-<m[x), which implies 
the required inequality. 

c) Let m satisfy a) as well as m. Obviously m is r.e.. It remained to show that m is a semi- 
measure, i.e. X]m(x) < oo. Let ^m(x)— oo. Then, obviously a recursive function m'(i)<m(z) 
exists such that £m'(x) = oo. Let s{x) = [ lo g 2£ v < I m'(i/)J. Let u{x) = m'(x)2~ s ^/s 2 {x). Then 
u(x) is a measure since Yl u ( x )<Yl l / n2 < °°- Obviously u{x:m(x)/fj,(x)>2 n }>u{x:m'[x)/u(x)>2 n } 
>/i{x:s(a:)>n} >l/n 2 , which contradicts the Martin-Lof condition. Q.E.D. 

Let two random variables, defined on the same probability space, be independent and have 
the same distribution u. This is equivalent to the fact that their joint distribution is u(g)u where 
u(g)u(a, b) = u(a)u(b). Suppose the properties of the pair (x, y)GU 2 correspond to the results of a 
random process with distribution i=m(g)m, i.e. d[[x, y)/i) is small. What is the intuitive meaning 
of this? The same as of the assertion that "(x,y) has the properties of the results of the pair of two 
independent random processes, and each of x and y has the properties of the results of a random 
process with distribution m". The second part of this assertion is vacuously true, since all numbers 
have the properties of the results of a random process with the a priori distribution m: d(x/m)=0. 

Therefore, the smallncss of d((x,y)/i) means only that (x,y) has the properties of the pair of 
objects generated in an arbitrary way but independently of each other. It is natural to consider the 
value d((x,y)/i) as the deficiency of independence. Obviously d((x, y)/i) — I{x:y) ! This is consistent 
with the theorem of classical probabilistic information theory stating that two random variables are 
independent if and only if the mutual information between them equals 0. The difference is that the 
concepts given above arc applicable to the individual values themselves, and not only to probability 
distributions (i.e. random variables) on the set of values. 



1.4 CONSERVATION OF INDEPENDENCE 

The information I(x:y) has a remarkable property. It increases in no random or algorithmic 
(deterministic) processing of x or y and hence in none of their combinations. On the one hand this 
is natural, since if x contains no information about y then hope is little to find out something about 
y by subjecting x to various kinds of processing. (Torturing an uninformed witness cannot give 
information about the crime!) This may conflict with the common experience that the Monte-Carlo 
method solves many problems which are intractable without using a random number generator. The 
clue here is that one can always solve these problems by computing the probability distribution of 
the results of random input processing. For this one needs to consider all possible inputs (instead 
of a single random one) which is an unrealistic volume of work. Even so, theoretically, the Monte- 
Carlo method produces no "absolutely new" possibilities in this respect. 

Theorem 4 (Independence Conservation): 

Let A:H— +N be a recursive function, and <p be an r.e. measure on K. Then 

1) I(x:y)^I(A(x):y), 

2) exp{I(x:y))>zE v? ( z )exp(I((x,z):y)), where E means mathematical expectation. 

Proof: I(x:y)~I((x, A(x)):y) since x and (x,A(x)) are computable from each other. It remains to 
prove that I[(z, x):y))^-I(x:y) for x=A(z). This reduces to K(x,y,z)-<K(x,y)-\-K(x,z)-K(x). 
We need an elegant lemma of Peter Gacs: 

Lemma 1: K[t,K(t))~K[t). 

Indeed, let p be the shortest code for t. Obviously, K(t)=l(p) is computable from p as well as 
t is. Therefore, the complexity of (t,K(t)) equals l(p)=K(t). 

Definition: An r.e. function m(/):N X U— »R+, largest to within a constant factor among such ones 
that sup J5jm{x/y)) < oo is called the universal conditional measure. K[x/y) = — [log2"i(^/y)J- 

Lemma 2: K{x,y)~K{y/{x,K{x)))+K{x). 

Let moo(y/i, n) = m(x, y)2 n . A nondecreasing by k, recursive sequence m^{y/x, n): Ak~ *Q 
exists, such that m cc = sup m^, where Ak are finite subsets of N 3 . Let m(y/x, n) = sup fc {m^y/x, n):- 
Yjmk{z/x,n)<.\}. Obviously Vx, n£m(y/z, n)<l (thus m(/)~<m(/)) and Vx, n if ^m(z, y)<2~ n , 
then moo(y/x, n) = m(y/x, n). Therefore Vx, n if ]£ro(x, y)<2 — n , (i.e. if m[x)-<2~ n , or n>^/f(x)) 
then m(y/x, n)>Jn{y/x, n) = m^y/x, n) = 2"m(x, y). Thus K(y/x, K(x))-<K{x, y)-K(x). 

It remains to prove that K[y/{x,K{x))))^K(x, y)-K(x)~K{x, y) -K(x,K(x)). This follows from 
the facts that K{x, y)-<K{y, x, K{x)), K(x)~K{x, K{x)) and K(y, t)^<K(t)+K[y/t). The latter in- 
equality holds since m'(y, t) = m(t)m{y/t) is obviously an r.e. semimeasure and then m'(y, t)^m(y, t). 
Analogously can be obtained K{x, y, z) -< K(x, K{X)) + K{y/[x, K(x))) + K[z/(x, K[x))). 

Now, item 1 follows from the note that 
K(x, y) + K(x, z) - K(x) ~ K(y/(x, K(x))) + K(x, K(x)) + K(z/(x, K{x))) + K(x, K(x)) - K(x, K[x)) 

For the proof of 2) one needs to show that: m(z, y)/(m(x)m(y))^£^(,)m(x, y, z)/(m(y)m(x, z)) 
which can be reduced to E rn ^m(x, y, z)/m(x, z)^<.m[x, y)/m(x) since m(z)>2(p(z). Let us transform 
it: £\m(2)m(x, y, z)/m(x, z)^<.m{x, y)/m[x); ^2 z m(z)m(x)m(x, y, z)/m(x, z)^m(x, y). The latter in- 
equality follows from the obvious ones: m(z)m(x)^m(x, z) and £Zm(x, yi z )^ m { x > y)- QE.D. 



Returning from the logarithmic scale to the linear one strengthened item 2) of Theorem 4. 
This scale is natural to use with such linear operations as mathematical expectation. Theorem 4 is 
formulated to within additive constants independent of x,y, but dependent on A or <p (bounded by 
K(A) and K(ip) respectively). Item 2) is related not to z but only to the mathematical expectation on 
it. I.e. the information may increase in a random process, but only with negligible probability (by n 
bits with probability 2~ n ). Both reservations are unremovable, since one can increase information 
by randomly guessing n symbols of y or by means of an algorithm A already having these symbols 
in its program. This does not diminish the meaning of the Theorem; since one should consider this 
additional information as having been present originally in the program of A (or "in our luck" in the 
case of guessing) rather than arising from the processing of x. Theorem 4 excludes any more efficient 
possibilities. Processes more complex than those in Theorem 4 can be obtained by combining its items 
(e.g. a generalization of 2) by substitution ip x {z), dependent on x, for <p[z)). Theorem 4 also implies 
non-increasing information in any combination of random and deterministic (recursive) processes. 
This supports the Principle below about the conservation of independence in any physically realizable 
process of information transformation. 



The following formulation and discussion of this Principle is a deviation from the formal account, given for 
motivation of the formal results. To confirm the Independence Principle one may say that it is usually possible 
to "explain" known physical processes. To "explain" means to reduce thern to simpler ones in combination with 
recursive and random transformations. General ideas about the development of the physical universe, on the whole, 
also assume that it was originally in a state of random movement of a hot plasma and then was transformed according 
to the (recursive) equations of quantum mechanics (additional randomness appears in the observation processes). It is 
clear that, not being a mathematical assertion (the physical world is not defined mathematically), the Independence 
Principle (like, for example, Church's thesis) cannot be proved. 

The Principle will be used in further chapters for the case of infinite sequences, for which independence means 
finiteness of the mutual information. It is clear that such an understanding is not suitable for finite objects. What 
we mean here by the independence of X, y(EFJ is the smallncss of I(x:y). Thus it is not an absolute property (as in 
the case of infinite sequences), but rather a quantitative characteristic. (I(x:y) is the "deficiency of independence", 
see Section 1.3). The prediction that x and y are independent means that for any n the degree of our certainty that 
I(x:y)<n is the same as that the first n results of the "honest" toss game will not consist of total zeros. So, 

Independence Principle: 

If x is a sequence generated by a process in the physical world, and y is one determined 
(ineffectively) by a property P(y), formulated with no reference to events of the physical world, 
then x and y are independent to within the formulation length of P, i.e. I(x:y) — l(P) is small. 

This Principle is not trivial only for those (ineffective) properties P which determine sequences with complexity 
essentially bigger then length of P. As an example, x might be the collection of publications of the American 
Mathematical Society and y might be the list of all true arithmetical assertions of a length less than 1,000,000,000. 



1.5 RANDOMNESS AND INFORMATION FOR INFINITE SEQUENCES 

For the perfect extension of our concepts to the space 17 one would need a non-intuitive technique 
of functional analysis. The following notions are not perfect, but clearly connected to the preceding 
sections. The next definition is a version of a definition from [L73]. In the special case of the uniform 
measure it is equivalent to a definition from [Ch75]. 

Definition 3: The value D(a/u) = [ log 2 sup(M((a n )*)//i(a„))J is caUed the deficiency of ran- 
domness of a£fi with respect to a semimeasure u. 

Definition 4: The value I{a:/3) — D((a, /?)/M(g)M) is called the amount of information in a 
about (3 or the deficiency of their independence. 

Any r.e. measure is recursive and thus computable with any accuracy, but r.e. scmimeasures of 
a segment of a sequence can in general be effectively approached only from below. In particular, any 
r.e. set of recursive upper bounds to M is bounded from below. But it may be known about some a£ft 
that on its segments the r.e. semimeasure M agrees with some r.e measure u to within a constant 
factor. Then, computing u we can find M(a n ) to within a factor (or K(a n ) = — [ log 2M(a n )J to within 
an additive constant). Such a we call complete, denoted aEC or C(a)*=> 3usup(M(a n )/n(a n )) < 00. 
This means that a contains all information necessary for computation of complexity of its segments. 
By Proposition 5, C is very extensive. By virtue of its item 2), any sequence a satisfying the 
Independence Principle (as x) has a completion (a, P)GC, satisfying this Principle as well. 

Proposition 5: 

1) The set C is closed under the application of any total recursive operators (A(C)dC) and 
the complement of C is of measure in any recursive measure. 

2) Let 7 be a sequence to which a universal r.e. set is Turing reducible and a be independent 
of'). Then /?£N exists such that (a, /?) is complete and independent off. 

Proof: Let 6 m (a) = log % sup(M(a n )/u(a n )). Analogously with Proposition 3, S m is a Martin-Lof 
u-test (Definition 5). Let o£C Then 3u:6 m (a) < 00. Letu'[x) = fj,{a:A(a)Z2>x) . Then u' is also an r.e. 
measure, and 6 m -(A(a)) is a Martin-Lof u-test. Then by virtue of Proposition 6, S m -(A[a)) <^D(a//x). 
Obviously D(a/u)^6 m (a). By our assumption 6 m (a) < 00. Therefore S m -(A(a)) < 00 and A(a)EC. 
Obviously u(C)=l because 6, n (a) is a Martin-Lof test. 

It remained to prove 2). In section 3.2 of [L70] it is shown that M (like any other r.e. semimeasure) 
can be obtained by means of a partial recursive operator A from a recursive measure u: M(x) = 
u{a:A(a)ZDx} . Let A' (a) = {A(a), £/i(a)), where £.4(0) is the sequence of values of the time of A(a) 
terms calculation. The operator A' is total and, hence, u'(x) = u{a:A'(a)ZDx} is a recursive measure. 
M is generated from u' by the projector (a, t )— *a. By Proposition 7, u'{(a, t): I{(a, t):~f)=oo} = 0. 
Also u'{Q — C) = 0. Therefore, M{a:Vf £Q(((q, t) £ C)\JI({a, t):7)=oo)} = 0. By Lemma 3, for 
any set A such that M(A) = 0, a sequence /3 exists on which all elements of A depend. The same 
is fit for any sequence to which /? is reducible. Using reducibility to 7 of the universal r.e. set, one 
can routinely check that the necessary )3 is computable with respect to 7. Thus 7 depend on all 
sequences a not completable to a complete, independent of 7 sequence (a, t). Q.E.D. 

The a comes from (a, (3) by a partial recursive (but not total) projection operator. As V'jugin 
[L77] has shown, partial operators can lead out of C if the time of their work is bounded by no total 
recursive operator. In Chapter 3 we postulate, along with a version of the Principle of independence 
conservation, an axiom that means intuitively that every sequence in the physical world comes from 
a complete one as a result of the application of a partial recursive operator. 



1.6 TIME OF COMPUTATION 

In this work the computational resources necessary for enumeration of various r.e. sets are, as 
a rule, ignored. Now we touch this question briefly. Let t A ^ mean time of the A(p) computation. If 
^( s )< n i then 3q:l(q)<n,A(q)=s, where A is the optimal algorithm from Proposition 2. This q can 
be found by searching through all words shorter then n. This requires very large (exponential) time, 
even if t A ^ is linear. Now we give the optimal (by time) algorithm for searching for q. We assume 
that algorithms arc realized by storage modification machines of Kolmogorov-Uspenskii. 

Let Kt A (x/y) = min {(i(p)+log 2 < / i( Pi j / )):/l(p, y) :z =x} ) where p is a binary sequence without ter- 
mination mark: the algorithm A can receive, by request, the symbols of p in order until p is ended; 
in case of further requests A gets no reply and gives no output. Kt A (x) = Kt A (x/\). Analogously 
with Theorem 1, an algorithm A exists such that Kt A is minimal up to an additive constant, and we 
will denote Kt A by Kt. There exists an algorithm G(n,y) generating the list {x:Kt(x/y)—n} in time 
2"; and up to a constant factor, Kt is a minimal function with this property. (The asymptotically 
minimal one is kt{x) — min {Kt(a): xGad^i}, |a|<oo.) 

Let R(s,q) be a predicate, recognizable in time ^( a , 7 ) < P(l(q)), where P is a polynomial. The 
problems of finding q (if it exists) satisfying R(s,q) are called search problems, and the problems of 
discerning 3q{R{s, q)&I(q)<.P(l(s))) are called the NP-problcms. Without loss of generality one can 
consider P linear by adding zeros to s and q. Searching through all q in the order of increasing Kt[q/s) 
(instead of l(q)) gives the fastest algorithm (up to a constant factor) for solving any search problem 
(see [L73a]; related ideas also have been expressed by L. Adleman). In particular, it is optimal for 
finding q such that A'(q)=s,l(q)<n, t A -( q )~l(s). For the case of large t A { ( ^ a similar algorithm works: 
namely in the definition of Kt, replace the expression ^( p , y ) by tA'{A(p,y)) = ^A(p,y) + t A -( x y 

Functions of the type Kt arc of a particular interest for the case of algorithms with random 
number generators. For /:N— *R'+~ let C{f) — —\og 2 fduj/(t A ^-\-f{A(uj)), where w is the random 
variable and, like above, A is the optimal algorithm minimizing C. For F(ZM, C(F) means C(/), 
where f(x)~ -0, if xEF, else f(x)=oo. The above algorithm G(n,cj), generating numbers x randomly, 
hits any F(ZN in time <2" with the probability p > l-2~ n , where log a ~^n-C(F). Obviously, for 
any such algorithm: p < 2 n ~ a ^'\ Thus C[F) determines the time which is necessary, and essentially 
guaranteed, for "hitting" F. A function f, with range other then just {oo, 0}, can be interpreted as a 
"price" (for instance, the time) necessary for establishing z£F=/~~ 1 (R+). Everything is analogous 
for C(f/y) = - log 2 / du,/(t A{ „ iU) +f{A(u,, y), y)). 

A number of search (NP) problems is known which have no proof of quick solvability by deter- 
ministic algorithms, but are very quickly solvable by probabilistic ones. E.g. integer compositness 
and constructing "non-compressible" words q, i.e. ones equivalent to no essentially (say twice) shorter 
word p (q and p are equivalent if they arc transformable into each other by a simple and quick 
algorithm). The complexity of the search problem R(s,q) for probabilistic algorithms is characterized 
by C(/s/s), where f a (q) = t[i( Si ,,), if R holds, and f !l {q)=oo otherwise. The relationship of this com- 
plexity with l(s) is a "randomized" version of the P=NP problem. But the problem of its relationship 
with the "complexity of obtaining s" looks much more interesting. More accurately: how docs the 
function of n, C({s: oo>C(/,/6-)>n}) grow, polynomially or exponentially? A short s may exist for 
which it is very difficult to find q such that R(s,q), but to find such an s may be even more difficult. 

Other results about the computation time may be obtained by diagonal methods. E.g. the 
results of [L73b] for Turing machine space remain valid for many other types of complexity: time 
of storage modification machines, exponent of Turing machines space etc. 
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2. APPLICATION TO THE FOUNDATIONS OF PROBABILITY THEORY 
2.1 FOUNDATIONAL DIFFICULTIES (A historical digression) 

This section is not formally related to the work. Its purpose is to clarify the context in which the problems to 
be studied in Chapter 2 arise (in particular the problem of the introduction of the concept of "randomness"). 

Hilbert's sixth problem suggests "To treat in the same manner (as geometry), by means of axioms, 
those physical sciences, in which mathematics plays an important part; in the first rank are the 
theory of probabilities and mechanics." (see [H02]) 

It was generally considered that this problem is completely solved in A.N. Kolmogorov's 1933 book [K33]. 
However, this is only partially so. Kolmogorov's work opened great possibilities for the development of techniques of 
probabilistic methods and their applications. At the same time certain foundational difficulties were left unresolved. 
Kolmogorov noted this in the foreword to the second Russian edition of the book, where he refers to works by 
Kolmogorov, Zvonkin and Levin [KG5, L70] for his new approach. The well-known previous attempts to overcome 
these difficulties by J. von Mises [vM64] and A. Church [C40] turned out to be imperfect. 

The difficulties lie in the gap between intuitive probabilistic ideas and those methods which are justifiable 
theoretically. The premise for the use of probabilistic methods is the assumption that the result x of a physical 
process arises randomly with probability distribution fl. This fJ, is discovered or hypothesized e.g. by analogy with 
other processes and statistical data about them, considerations of symmetry, etc. Then, according to the naive 
ideas, those properties of x are indicated as probabilistic laws whose /i-probability is 1 (approximately, in the finite 
case). E.g. when X — Xi, ..., X n , where Xy, ,.., X n are independent and identically distributed (i.e. fifei, ...X n ) = 
f.l'(xi)fi'[X2)- ■ .(J,'(x n )) the law of large numbers plays an important role. For each property B it asserts that with 
/i-probability close to 1, the frequency of B[X{) realization is close to the probability /i'(5). In any case, subjecting 
x to such laws is predicted, i.e. having properties whose probabilil - is 1 (approximately, in the finite case). 

The problem is that jointly the properties of probability 1 have probability ! We cannot predict 
the realization of all of them simultaneously, but we should choose one or a few of them for prediction. Thus if the 
result had arisen before we managed to make a prediction, we could not expect to subject this result to any statistical 
tests. For example, classical theory provides no rigorous basis to doubt the honesty of the lottery director after his 
son wins the first prize in ten consecutive years, if we discover this "post factum"! We cannot subject an election 
to criticism when the share of votes for the ruling party in a series of consecutive years formed a sequence 0.99/Ej, 
even if k{ turns out to be the digits of the decimal expansion of the number 7T ! Of course, one can select a few 
"standard laws" and presume their predictions if before the beginning of the experiment this selection was not changed. 
However, standard probability theory contains no principles which would allow the distinction of 
such standard laws from others. Besides, it would not solve the problem of applying probability theory to events 
which had occurred before such a standardization (for example, to cosmology, history, geology, etc.). 

The idea of solving this paradox consists in considering as "standard" those properties of probability close to 1 (in 
the finite case), which are "simply expressible". The objects not satisfying such a properly form a simply expressible set 
of small measure and correspondingly small cardinality. Thus any such set element is simple itself, being determinable 
by its number (smaller then set cardinality) with the simple set description. This allows us, instead of indicating 
many simple "standard" properties, to consider a single one: "not to be a simple object". Kolmogorov's algorithmic 
information theory was a surprising discovery, provided a rigorous basis for the obscure notion of simplicity. In the 
infinite case the corresponding property is "to be random with respect to distribution fX" . Then only this property 
is postulated to follow from the assumption about the random occurrence of an object in a process with distribution 
\l. This property is of /i-measure 1 and implies all other "good" properties or /i-measure 1. Attempts to introduce 
such a concept were also undertaken by von Mises and continued by Church for distributions fJ, of the Bernoulli type 
[vM6'1,C40]. However, it was found [V39] that even such standard properties as the law of the iterated logarithm do 
not follow from their notion of randomness (i.e. from the property of being a collective). 
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2.2 THE LAWS OF RANDOMNESS AND INDEPENDENCE 

We consider below two types of properties of sequences a£Q. These properties are the laws 
of probability theory in the usual classical sense (i.e. the probability of their violation equals 0). 
For simplicity we restrict ourselves to the case of recursive probability distributions, though these 
results can be generalized to non-recursive cases as well. 

The Law of Randomness: Let us say that aEfi, satisfies this law (is random) with respect to 
a semimeasure p, if D(a/u) < oo (D is given in Definition S). 

This means that the values of u on segments of a are not much smaller than ones of the a priori 
probability M. (i.e. the hypothesis that a has occurred randomly with probability distribution u is 
at least as consistent with reality as the a priori idea about occurring it with the distribution M). 
As we will see below, this property implies fulfillment of all effective probabilistic laws. 

Definition 5: An r.e. function 6-.Q-+U is coiled a Martin-Lof test with respect to a recursive 
measure u (or a u-test) iff Vn log %u{a: 6(a) J>n}< — n. 

It is said that a sequence withstands the test if 6(a) < oo. Definition 5 is a formalization of the 
concept of a "good" law of probability theory. The value 6 means the degree of deviation from such 
a law. Complete deviation occurs at (5(o)=oo, the probability of which is 0. The deviations can be 
effectively discovered since 6 is r.e. The logarithmic scale of deviations is chosen for convenience 
(the definition serves equally well with any other recursive scale). 

Proposition 6: Let fj, be a recursive measure, then 

1) D(a/u) is a Martin- Lof test with respect to u. 

2) For any Martin- Lof test 6 we have 6(a) <^D(a/n). 

(The proof is analogous to the proof of Proposition 3.) We see that if a withstands test D with 
respect to measure u then it withstands all conceivable /x-tests. These tests correspond to the "good" 
laws of probability theory. What is the situation with the bad ones? This is interesting because it 
clarifies the relation between the algorithmic and classical approaches to probability theory. Let us 
give an important example of non-recursive laws. 

The Law of Independence: Let 7£f2. We say that a£f2 satisfies this law if a and 7 are 
independent, i.e. I(a:~f) < 00. Then I(a:~i) is "the degree of deviation" from this law. 

Proposition 7: For any 7EQ and r.e. semimeasure u, the value I(a:~f) satisfies the relation 
log 2fJ-{o.:I(a:i)>n} <^ — n (Compare this inequality with one in Definition 5). 

Proof: It is sufficient to show that log 2 M{a:I(a: 0)>n} <; — n. Let DN(a/u) = log 2 sup I(: - a - 
mf x - ZDx (M(x')/u(x')), and IN(a:/3) = DN((a,P)/M<g)M). Let us prove first for any r.e. semimeasure 
u that DN(a/u) ~^D(a/u). Let u tx be a recursive non-decreasing by t sequence of semimeasures 
such that rvnr x =c» =» u lT (x)=0;l(x')>t =» u tx (x)^0;r z C\r x y^oo =» sup u ltX (x)=u(x). Let 
t(x, n) = sup {t':u liX (\)<2~ n m(x*)}; ii'n,x=Vi(x,n),x and /*'=0 2n / n2 )A*n,x- Obviously, u' is an r.e. 
semimeasure and, hence, u'~<M. Besides, Vi,i>((iCi'; m(x)/u(x)>2 n ) => (u'(x')/u(x')>2 n /n 2 )=* 
(M(x')/u(x')> 1 2 n /n 2 )). Then, by the definitions of D and DN, D(a/u)>n => £W(a//x)>--n-2 log 2 n. 

It remained to show that log 2 M{a:IN(a:P)>n} -< — n. Let A n ,p = {a:IN(a:/3)>n} and 
M(A ni p) > 2~""K Being an open set, A n> p has a clopen subset A' such that M(A') > 2~~ n+c . Then 
k and ' TdS k exist, such that A' = \jr x :x£T and thus, VzGT: log 2 (M(x, p k )/M(x)M[p k )) > n; 
2-n+c < f^M(x):xET. Then ^M(x,(3 k ) > 2"M(p k )J2M{x) > 2 n M((3 k )2- n + c , and therefore 
£A/(z,/?fc) > 2 c M(p k ), which is impossible for c large enough. Q.E.D. 
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2.3 COVERING OF THE CLASSICAL FORMULATION OF PROBABILITY THEORY 

The law of independence (as well as of randomness) is violated only with probability 0, and thus 
it is a law of probability theory in the customary "classical" sense. Its meaning is that a randomly 
generated sequence must be independent of a sequence given beforehand. This law depends only on 
the parameter 7, and nothing is suggested relative to 7 except that it is present in advance (e.g. is 
uniquely defined by some property in the language of our law formulation, e.g. in formal arithmetic). 
Note that in the formulation of this law (I(a: r )) < 00) the probability a is not mentioned at all. 
This property of a can be prescribed before specifying the probability distribution of the process 
generating a (i.e. this property holds independently of the parameters of probability theory). 

In section 1.4 we have seen that this law is more general than the usual laws of probability 
theory. The Independence Principle in 1.4 asserts that this law is realized not only in the usual 
random processes, but in any process of the physical world as well. This encourages one to bring this 
law outside the limits of probability theory and to consider other probabilistic laws only for those 
sequences which satisfy the law of independence. It turns out that this makes other laws unnecessary! 

Theorem 8: Let [i be an r.e. measure. For any set A such that u[A)—Q, such a 7 exists 
that Ad{a- D(a/ (j.)— 00} \J{a: I(a:'))=oo}, i.e. any probabilistic law (in the classical sense) is 
reduced to the two laws considered above. 

If we confine ourselves to sequences satisfying the law of independence (this includes, in accord- 
ance with the Independence Principle, any sequence in the physical world) then any law (recursive 
or not) of classical probability theory is reducible to the "law of randomness". 

Proof of Theoiem 8: It is easy to see that S m (a) — log2Sup(M(a„)//i(a n )) is a Martin-Lof test. 
Therefore if D(a/fj.) < 00, then 3cVn M[a n ) < cfi(a n ). Let A' — Af){a: D(a / fx)<oo} . It is clear 
that M(A')=0 follows from n(A')=0. Thus it is sufficient to prove the next 

Lemma 3: 

For each set A' such that M(A')=0 a sequence (3 exists on which all elements of A' depend. 

Obviously a sequence A m ~Z)A' of open sets exists such that M(A m )<.2~~ m . Let /4' m OSxQ~'~ 
be such that {(x, rj), (y, r 2 )eA' m , xCLy) => [x=y, n=r 2 ); A m = {ct:3[x, r)£A' m , x(Za}] [x, r)£A' m => 
M{x) <r< 2M(x). Let /3 be a sequence with respect to which A' m is r.e. Then a recursive set T 
exists such that (1, r)GA' m **3yCfl[x, y, m, r)£T; VmVaGftX>. } (s) < 2~ m + l , where s£T, 7r 2 (s)Ca, 
P3(s)=m (if s=(oi, a2,a.3,a.\) then 7r,(s)=a,). Now we shall replace T in such a way as to make 
(x,y,m,r)ET' => l(x)=l(y) fulfilled. First we replace each quadruple (x,y,m,r), where l(x)>l(y) 
by the set of all quadruples [x, y', m, r), where l(y')=l(x), y'^Dy. Then we replace each quadruple 
(x, y, m, r) where l(y)>l(x), by the set of the quadruples (x', y, m, r'), where l(x')=l(y), x"2>x, and 
r' < M' vxr (x'). M'y }Xr (x') is given in the following way. We generate evaluations from below 
of numbers M(x'):(x"Z)x,l(x')=l(y)) until their sum exceeds r. If this happens we stop the process 
on the previous step and the result will be M\ JXT [x'). Otherwise M'y iXir (x') = M{x'). Let T" be 
the set of triples (x, y, m) such that 3r(x, y, m, r)G7". If s(ET", then r(s) = sup {r':(s, r')ET'}. The 
obtained T" and r satisfy the following conditions: 

1) (x, y, m)£X" =» l(x)=l(y) 3) yC/3 => r(x, y, m)>M(x) 

2) A m = {a: 3(x, y, m)GT", yC/3, xda) 4) VmV/? 2--+ 1 >£>(*, y, m), y(ZP 
It follows from 4) that ]T/(a;, y) < 00, where r(x, y) = 5Z(2"7"i 2 )r( a; ! V> m)M(y). 
Therefore, r(x, y)-<M{[x, y)*). Obviously VaEA in 3x, y: l[x)=l(y), x(Za, y(ZP, r{x, y, m)>M{x). 
Hence VaeA ni I{a:/3)>-T n /m 2 . Then Vo£/i /(a:/?)=oo. Q.E.D. 
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3. APPLICATIONS TO INTUITIONISTIC MATHEMATICS 
3.1 A DIGRESSION 



It is known that second order theories are much more complicated logically than those of the first order. The 
theories permitting free handling of elements of a continuum (in particular, quantification over them) belong to the 
second order. First order theories admit quantification only over constructive objects. 



At the beginning of this century a number of mathematicians (the intuitionists) suggested that these complications 
are artificial and even dangerous in the sense of the possibility of paradoxes. In particular, they asserted that elements 
of a continuum (unlimitedly elongated sequences, unlimitedly small points, etc.) do not make sense as logically defined 
formal objects, but are taken from the physical world. Therefore, the applicability of usual logical operations to them 
is not d priori obvious when these operations have no analogies in the physical world. For example, in order "to 
apply" a classical universal quantification, one would need the ability to scan all conceivable sequences; this is not, 
of course, physically implcmentable. It was suggested that, having restricted our logical means only to such formal 
procedures and postulates that have closer connection with "physical intuition", we would obtain a mathematics whose 
proof power is more mensurable and less suspicious. The evident difficulty is in the obscurity of our physical intuition. 
This brings up difficulties in the choice of the formal principles which would reflect adequately the nature of sequences 
being generated as a result of events of the physical world. Brouwer's original idea of sequences generating by "free 
choice" of their terms clarifies the situation not enough, since the concept of "freedom" is itself obscure. A result Is 
a great variety of intuitionistic principles and theories that strengthen, weaken or contradict each other. 



As a rule, these theories are too strong, on the one hand but too weak on the other. They are strong to the 
extent that the connection of their principles with physical intuition ceases to be obvious. This is aggravated by the 
fact that often with respect to the possibility of the inconsistency occurrence, these theories turn out to be equivalent 
to the corresponding classical ones (which kills the hope for the increased "reliability" of intuitionism). They are weak 
to the extent that they leave unsolved many natural questions about the validity of other principles of intuitionistic 
reasoning. The latter fact generates multiform possibilities of extending these theories, and provides abundant material 
for research. However, this eliminates the possibility of obtaining a theory which gives us some feeling of completeness 
and is suitable for "canonization" as the universal foundation of intuitionistic mathematics. 



In this section we will try partly to overcome these difficulties by using an axiom schema which corresponds to 
the Independence Principle (see 1.4). On the one hand, this Principle seems to have more tangible (physically clear) 
foundations than many arguments about the nature of "free choice". It turns out that with respect to consistency 
and mensurability of the proof power, the theory obtained is equivalent to the classical first order arithmetic. More 
accurately, the intuitionistic second order arithmetic (analysis) considered below is a conservative extension of 
the classical first order arithmetic, formulated without disjunction and existential quantification. On the other hand, 
it is in a sense complete. More accurately, it has no essential extension which would retain the indicated property of 
conservativeness (i.e. an extension gotten by adding an essentially new principle which is "purely logical" i.e. implies 
no new theorems of classical number theory). AH these "virtues" of the theory below are connected with the fact that 
the Independence Postulate excludes the existence of sequences containing unbounded information about the truth of 
mathematical statements. It is natural to attribute the usual troubles of second order theories to such fancy "logical" 
sequences which in fact do not exist in the physical world. 
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3.2 THE PRELIMINARY CALCULUS A 

Our theory AI will be constructed in Section 3.3 by adding a group of axioms to the basic calculus 
A, described below. A is formulated in the usual language of second order arithmetic. This language 
is obtained from the one of first-order arithmetic (see [KL67] section 38), by adding a countable list 
of second order variables denoting sequences (functions) of natural numbers and adopting the term 
a(t) and formulas VaF and 3aF for any second order variable a, term t, and formula F. A formula 
is called absolute if it is constructed from equalities between terms with the aid of conjunction, 
implication, negation and universal quantification of first order variables. Absolute formulas have 
the identical meaning and equivalent provability in intuitionistic and classical theories. In section 
0.2 a definition of the pair of numbers is given. In the same fashion we give meanings to the notion of 
the pair of terms, expressions (oi, ..., a n ), a[cn, ..., a n ), (a, /?), etc. Allowing liberties with the language 
we use the notation n = PrtT (n equals the projection on variable t of the term r) for the fact: 
3s(r(s)=n-|-l&(Vs'<s: r(s')=0)), (i.e. (n+1) is the first non-zero term of sequence r(t)). Handling 
the expression Pr t T like a term will never cause any misunderstanding, in particular thanks to (3.2.2). 

The postulates of A consist of the postulates of first order arithmetic (see [KL67], p. 387, List 
of Postulates, Schema 8 is taken in the intuitionistic version 8') and three second order postulates: 

Schema of Choice: (Vn(-v4=»3a:B(a:))) => 3aVn(M => B{Pr t a(n, t))) (3.2.1) 

Markov Principle: (-A/n a(n)=0) =» 3na(n)y^0 (3.2.2) 

Axiom of Countability: 3aV/33/cVn/?(n)=prfa(fc, n, t) (3.2.3) 

Axiom (3.2.3) asserts that the set of intuitionistic sequences is countable. Under the interpreta- 
tion of intuitionistic sequences as sequences of results of real macro-events in the physical world, 
this axiom corresponds to the customary statement on the existence of a countable basis of open 
sets in the space-time. We do not discuss the axioms of A in detail, since they are not original. We 
observe only that for the construction of any complete calculus (one that satisfies Theorem 10), it 
is necessary to adopt either these axioms (at least under double negation) or their negation or their 
equivalence to some undecidable absolute statements of number theory. The last two variants seem 
less natural. It is known that (3.2.1 - 3.2.3) are inconsistent with the principles of continuity and bar- 
induction. In this respect the calculus A more resembles Kleene's theory of recursive realizability. 
Of course, the calculus A is still too weak. Nonetheless, we have 

Proposition 9: For any formula F an absolute P exists such that A\— F<=>Va3f3 P. 

Proof of Proposition 9: The proof is based on the fact that the axioms of A allow us to introduce 
a concept analogous to Kleene's recursive realizability, by using the universal sequence a from axiom 
(3.2.3). Namely, the concept "a number x realizes a formula F with respect to a sequence a" is 
defined in the same way as in Kleene's book (c.f. Introduction to metamathematics, Chapter 2), 
but recursiveness of all the functions used is replaced by recursiveness with respect to a. It is easy 
to prove in A the equivalence of any formula F to the existence of a realization of F with respect to 
a universal a. The latter assertion is equivalent to the fact, that for any (3 there exist a sequence a 
and a number x realizing F with respect to (a, (3). It is easy (though bulky) to check that A contains 
all the axioms necessary for formalizing these arguments, i.e. the deduction of F<=>\/(33aP, where P 
is the absolute formula expressing that q(0) is the realization of F with respect to (a, /?). Q.E.D. 
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3.3 THE CALCULUS AI; ITS RELATIVE CONSISTENCY AND COMPLETENESS 

Let P(n) be an absolute formula with a single free variable n. A finite binary sequence p is called 
compatible with P (denoted pdP) if, ^n<.l(p): (p(n)=0 <=> P(n)). The abbreviation I(a:P) means 
sup {I(a:p): p(ZP}- For a given P, the statement I(a:P)<.c can be easily expressed by an absolute 
formula with free variables a, c. Using that we introduce an axiom schema with a parameter P: 

Independence Postulate: Va3cI(a:P)<.c (3.3.1) 

One more informational statement must be valid in AI. The property of completeness (defined 
in (1.5)) is expressible by an absolute formula C(q). Our last axiom asserts feasibility of sequences 
completion mentioned in Proposition 5 within the bounds of the theory: 

Vq37C(q, 7) (3.3.2) 

The double negation of this axiom follows from the weaker statement -i3aV7-iC(a, 7) inasmuch 
as we can use the existence of a "universal" sequence by axiom (3.2.3). Analogously, the double 
negation of (3.3.1) follows from the statement ->3aVc I(a:P)>c. These weaker versions are sufficient 
for our purposes, but we chose the formulations (3.3.1) and (3.3.2) because they are simpler. 

Definition 6. A theory is called absolute if for every closed formula F an absolute (see 3.2) 
formula P exists such that ->F<=*P is provable in this theory. 

The theory of recursive realizability of S.C. Kleene is an example of a theory known to be 
absolute. This is the theory obtained from A by replacing (3.2.3) with 

Church's thesis (CT): V/?3/fcVn: p(n)=U(k, n) (3.3.3) 

where U(k,n) is a universal partial recursive function. Condition (3.3.3) is obtainable from (3.2.3) 
by imposing the condition of general recursiveness on a. Our theory AI is, of course, not absolute, 
inasmuch as the formula -<->(CT) is not deducible in it, nor is it refutable, nor can it be reduced to 
any absolute formula. This formula however, is the only one of this sort; namely, 

Lemma 4. For any cbsed formula F four absolute formulas Pi, P% P3, Pa exist such that 
these statements are deducible in AI: 

-{PxMP-NP^Pa); -APi=*n HP2=*-F); -(P 3= *(F~(CT))); ^(P 4 =*(F~^(CT))) 

Then to get an absolute theory an axiom is necessary implying the truth or the falsity of (CT). 
It turns out that this is sufficient as well. The theory AI-j-(CT) is equivalent to the theory of recursive 
realizability of S.C. Kleene and is consequently absolute. It is of little interest for our purposes since 
by admitting Church's thesis (3.3.3) we would exclude from consideration all non-recursive sequences 
(for instance the random ones). To the degree that (CT) is a very strong axiom, the axiom -'(CT) is, 
inversely, very weak. Thus one might have not expected that the theory Al-\-->(CT) is also absolute. 
This fact follows from the following Theorem. 

Theorem 10: The class of absolute closed formulas deducible in AI -f- -'(CT) coincides with 
the class of absolute theorems of the classical first order arithmetic. No essential extension (i.e. 
one containing new theorems of the form ->F) of the theory AI -\- -'(CT) has this property. 

Thus the theory AI -f- -'(CT) is a maximal conservative extension of classical arithmetic. This 
property is in a sense consistency and completeness relative to classical arithmetic. The basic goal of 
the construction of this theory was the study of the possibilities given by the axiom schema (3.3.1). 

16 



Proof of Theorem 10: It is sufficient for each closed formula F to establish a corresponding ab- 
solute formula F such that: (AI+->(CT))\—<F<=>F, and if F itself is absolute, then -F«F is dcducible 
in first order arithmetic. Besides, one needs to show that every axiom F of (AI + ""(C7 1 )) W M be 
converted into theorem -•F of first order arithmetic and the rules of deduction will be converted into 
derivative first order deduction rules. We shall indicate the transformation F into F and explain its 
meaning without writing out all routine formal deductions. Due to Proposition 9 it is sufficient to 
restrict ourselves to formulas of the kind F = Va3/3P(a, (3), where P is absolute. We say that F is 
rejected on 7£ft if for any recursive function r:]N— >H it is false that for any recursive operator k:Q-*Q, 
applicable to 7, k'=r(k) is also applicable to 7 and P(a,f3) holds, where a=k(~i), and p=k'(i). Let 
f/bea recursive continuous measure. It turns out that the equivalence between -F and the formula 
"F is rejected for ^-almost all 7" is deducible in AI -f- ->{CT). 



The latter formula can be written in an absolute form and chosen as F. The point is that the 
quantifier "for almost all 7" in contrast to the quantifier "for all 7" is expressible in the first order 
language. Obviously the formula "F is rejected on 7", being absolute, can be presented in the form 
of Vnfc-'Vnfc_ 1 -i...V7Zo-'i?(7, no, "l-.nfc), where R is a recursive predicate, monotonic on each of the 
arguments rii (up - for the even i and down - for the odd ones). Let us show by means of induction on 
i, how the predicate /i{7:Vni_i^Vn i _2-'...Vno-'Pc(7, no...n fc )} >r is expressed by an absolute formula. 
For i=0 it is trivial. Now let, at the given i, our predicate be expressed in the form of S;(r, nfc...nj). 
Then Vn 1 Vr'>(l-r)->S i (r', n fc ...n;) can serve as S i+ i{r, n fc ...n t+1 ). Thus, it remains to show that --F 
is equivalent in AI -f- ->[CT) to the assertion "F is rejected for //-almost all 7". 



Lemma 5: Let u and \i be r.e. measures and fi be continuous. Then recursive operators P 
and P"onVL exist such that: 

1) VMcn m'[A) = u{P~ l {A)) 

2) Mu, a ^[P{P'{u))^a^P'{P{u,))) 

S) P" (respectively P) is defined on \i (resp. uj- almost all non- recursive sequences. 



The proof of this lemma follows from Theorem 3.1 b) in [L70]. Since the property "F is rejected 
on 7" is invariant with respect to any recursive reversible transformation of 7, it is sufficient to prove 
the equivalence of -iF to "F is rejected for //-almost all 7" just for fj, = B 2 (the uniform measure 
on f2 2 ). By virtue of the same invariance and Kolmogorov's 0-1 law (see [k33]), the set A of all 7, 
on which F is rejected, can be only of measure or 1 with respect to B<i- Hence if R is the set of 
all recursive sequences, the measure of [Af)-<R) or of (->Af\^R) equals with respect to any other 
recursive // as well. Then by virtue of Theorem 8 a sequence exists (and it can easily be defined 
by an absolute formula), on which all complete 7 from this set depend. The axioms of AI -f- ~'[CT) 
imply that any universal sequence (from axiom 3.2.3) is non-recursive, equivalent to a complete one, 
and independent from sequences, defined by absolute formulas. Therefore in the case n(A) = 0, F 
is not rejected on a universal 7 and ->-F holds. In the opposite case ->F holds by analogous reasons. 
These reasonings can be easily transformed to formal proofs in AI -\- ->{CT). Each of the two cases 
gives implication in one of the directions between ->F and "F is rejected for /x-almost all 7". Q.E.D. 



17 



4. APPLICATION TO THE THEORY OF TURING DEGREES 

4.1 INDEPENDENCE AND NEGLIGIBLE SETS 

One of the natural fields for application of algorithmic information theory is the theory of 
Turing degrees. It is natural to interpret the recursive reducibility of a to as that /? contains 
all information about a, more accurately, all information except a finite amount equal to the com- 
plexity of the reducing algorithm. However, the informational concepts are subtler and less awkward 
than reducibility degrees. In particular, the first concepts unlike the latter ones, are invariants 
applicable to finite objects as well (Theorem 4 shows that I(x,y) is invariant to within a constant by 
all recursive reversible transformations of N). Algorithmic information theory gives new interesting 
possibilities. One of them is the introduction of the concept of independence in addition to the 
concept of reducibility. In the language of reducibility degrees it would be possible also to say that 
q and (3 are independent if any sequence reducible to both of them is trivial (recursive). But a 
simple example shows that this definition is not adequate to intuition. Let a and 7 be 0,1-sequcnces, 
obtained in random processes of independent trials, where the probability of a n =0 is 1/2, and the 
probability of 7„=0 is 0.99. Let /?„ = a n 7 n . Then, a and /3 are almost always such that no 
nontrivial sequence reduces simultaneously to both of them, though 99 percent of the contents of a 
and /? coincide (in view of which it is hard to consider them as independent). 

We shall use the concept of independence from Chapter 1 for the definition of the concept of 
"negligible sets" of sequences. This give us the possibility of studying properties of Turing degrees 
"to within this negligibility". Many exotic types of Turing degrees are known. Such are, for example, 
"minimal" degrees containing indivisible information (any part of the information of such a degree 
/?, i.e. a degree a < /3, is equivalent to or /?). The existence of such degrees is proved by diagonal 
methods and the reality of the respective sequences would be strange. In particular, (see [P67]) 
the impossibility of the appearance of such sequences in any combination of random and recursive 
processes was proved. One may hope that many complications of the theory of Turing degrees are 
caused by exotic examples of this kind, and the theory of "real degrees" is simpler. We shall see 
below that this is partially so, but only partially. We call a set ACZQ inaccessible, if its complement 
is closed with respect to the use of every recursive operator F (i.e. Va(a £ A =» F(a) £ A)). 

Proposition 11: The following four properties of the set ACZQ are equivalent: 

1) A sequence aE^ exists on which all (3 '£ A are dependent (i.e. 3a\/f3G.A: I(a:/3)=oo). 

2) A is a subset of some inaccessible set A\, any r.e. measure of which is 0. 

3) A is a subset of some inaccessible set Ai of measure in some r.e. measure u not 
concentrated on a countable set (u(-<B)>0, if B is countable). 

4) M(A)=0. 

Proof: 1)=>4) and 4)=>1) follows from Theorems 7 and Lemma 3 respectively. It is obvious that 
F(M), the image of M at an arbitrary recursive mapping F.Q—yQ, is an r.e. semimeasure and hence 
F{M)^M. Therefore, if M(/\)=0, then A l =\jF~ i {A) is inaccessible and A U M{A{)=Q. This gives 
4)=>2)=>3). Lemma 5 implies that 3)=>2). Any r.e. semimeasure is the image of an r.e. measure at 
a recursive mapping n=>H (see [L70], section 3.2). This gives 2)=»4). Q.E.D. 

We call negligible the sets having any of these four properties (this neglect is, of course, based 
on our belief in the Independence Principle. We call i-equivalent two sets A and B if their symmetric 
difference is negligible. "Property of Turing degrees" means a set ACSl invariant with respect to 
Turing equivalence. Studying them to within i-equivalence, we can exclude from consideration some 
properties of "exotic" "unreal" degrees which will simplify the study. We denote by K the Boolean 
algebra of Borel sets of Turing degrees, and by L - its factor-algebra with respect to the i-equivalence. 
If A^zK , then A£L is the element generated by A, i.e. the i-equivalence class containing A. 
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4.2 TYPES OF TURING DEGREES 

In (1.5) the concept of "sequence completeness" was considered. The set of incomplete sequences 
has a property very close to negligibility. Namely, Item 2) in Proposition 11 is obtained from Item 1) 
of Proposition 5 by omitting the word "total" . Thus, incomplete sequences cannot arise in a process 
completable to a total one (in particular, in a process with the working time bounded by a total 
recursive function). It is natural to consider properties of the Turing degrees generated by complete 
sequences. It turns out that they are organized quite simply. Only four of them are not equivalent. 

Theorem 12: Let Adfl be the closure with respect to Turing- equivalence of a Borel set of 
complete sequences. Then A is i- equivalent to one of the four sets: 

a) The empty set; 

b) The set of recursive sequences; 

c) The set of all complete sequences; 

d) The set of all complete non- recursive sequences. 

Thus, the properties of a complete sequence (to within i-negligible sets) depend only on its 
recursiveness, and these sequences form the two most natural elements (atoms) of the algebra L. 

Proof: As it follows from Lemma 5, any set A of non-recursive sequences, invariant with respect 
to Turing equivalence, either is of measure at any recursive measure /z, or (for any u) contains 
/x-almost all non-recursive sequences. Then, by virtue of Theorem 8, a 7 exists such that all the 
complete non-recursive sequences either from A, or from the complement of A, respectively, depend 
on 7. Taking into account that the invariant set A contains either all recursive sequences, or none, 
we obtain that A is i-equivalent to one of the four sets, mentioned in Theorem 12. Q.E.D. 

Let us make a few notes about the rest of Turing degree types (containing no complete sequence). 
Even the proof that their union is not a negligible set turns out to be very non-trivial. It has been 
given by V. V'jugin [L77] who proved that L contains an infinitely divisible element and a countable 
number of atoms. Only two of them (namely, b) and d) of Theorem 12) contain complete sequences. 
V'jugin's constructions are very complicated. A portion of his proofs has not been published yet. 
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5. BRIEF REFERENCES AND THE BIBLIOGRAPHY 

The following remarks do not claim to present the history of the question and concern mainly the 
works directly used above. The algorithmic information theory originated with A.N. Kolmogorov's 
and R.J. Solomonoff's algorithmic approach to the concepts of information, randomness and a priori 
probability. This idea was based on the fundamental discovery of an optimal (up to an additive 
constant) coding method for constructive objects and a recursively invariant concept of complexity 
arising from it (cf. [K65, S64]). "Uspekhi Mat. Nauk" reports about Kolmogorov's talks on this 
subject for the Moscow Mathematical Society in 1961 and consecutive years. Some R.J. Solomonoff's 
ideas in the field were also given in preprints and in [M62]. See also [Ma64] and [Ch66]. 

However, in spite of the depth of the main idea, the accuracy of the mathematical expression of 
the basic quantities was not perfect. Many important relationships hold only with an error degree such 
as the logarithm of complexity. This error rate is of course negligible in comparison to the complexity 
itself, but it can exceed such derived quantities as mutual information, deficiency of randomness, or 
conditional a priori probability. This is connected with the fact that subtraction and division are 
used in the expression of the latter quantities. Thus, the main terms of the degree of complexity 
can be annihilated and only terms smaller than the logarithm of the main ones remain. Therefore, 
these errors distorted the picture very much and hindered the development of a transparent theory. 

With respect to the concept of randomness, these difficulties were overcome in very important 
work [ML66]. But the concept of random sequences proposed there was related only to recursive 
measures and did not cover other important cases. Sorr ^ other difficulties were overcome in [L70] 
where we introduced the concepts of the universal measure as the a priori probability and complexity 
as its logarithm. Very interesting studies of randomness concept were made by C.P. Schnorr [Sc71]. 

For the concept of information, the problem of giving a precise definition proved to be more 
difficult. The first non-trivial results were obtained by A.N. Kolmogorov and L.A. Levin in 1967. 
The initial definition of the mutual information [K65] was non-symmetric and had monotonicity 
only over one of the arguments. Kolmogorov and Levin [K68, L70] demonstrated that this value 
coincides approximately (up to a logarithm of the complexity) with a symmetric expression and 
therefore is approximately monotonic over its second argument as well. This yields the intuitive, 
and theoretically desirable property that a given text contains not less information about any given 
pair of texts than about either of them. 

In [L70] the universal measure was introduced. Its logarithm (equal to the length of the shortest 
code over the optimal self-delimiting algorithm) turned out to be a more satisfactory complexity 
measure on K than the original proposal from [K65]. It allowed improvement of the definitions 
of randomness ([L73]) and information ([L74]). The new definition of information was monotonic 
with a constant (instead of logarithmic) error and can be extended to the case of infinite sequences. 
This work is connected with the very subtle and non-trivial results of P. Gacs [G74] concerning the 
differences between the symmetric and the asymmetric expressions for information. A number of the 
results of [K68, L70, L73, L74, G74] were rediscovered independently by G. Chaitin in his famous 
work [Ch75]. Versions of some results of the present work were reported in [L74, L76, L77]. 
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